{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Leveraging TabPFN for Human Activity Recognition Using Mobile Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of Contents <a class=\"anchor\" id=\"0\"></a>\n",
    "* [Introduction](#1) \n",
    "* [Necessary imports](#2)\n",
    "* [Connect to ArcGIS](#3)\n",
    "* [Access the datasets](#4) \n",
    "* [Prepare training data for TabPFN](#5)\n",
    "    * [Data preprocessing for TabPFN classifier model](#6) \n",
    "    * [Visualize training data](#9)\n",
    "* [Model training](#10) \n",
    "    * [Define the TabPFN classifier model ](#11)\n",
    "    * [Fit the model](#12)\n",
    "    * [Visualize results in validation set](#13)\n",
    "* [Predict using TabPFN classifier model](#14)\n",
    "* [Accuracy assessment: Compute model metric](#16)\n",
    "* [Conclusion](#17)\n",
    "* [TabPFN license information](#18) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction <a class=\"anchor\" id=\"1\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Human Activity Recognition (HAR) using mobile data has become an important area of research and application due to the increasing ubiquity of smartphones, wearables, and other mobile devices that can collect a wealth of sensor data. HAR is a crucial task in various fields, including healthcare, fitness, workplace safety, and smart cities, where the goal is to classify human activities (e.g., walking, running, sitting) based on sensor data. Traditional methods for HAR often require substantial computational resources and complex hyperparameter tuning, making them difficult to deploy in real-time applications. TabPFN (Tabular Prior-Data Fitted Network), a Transformer-based model designed for fast and efficient classification of small tabular datasets, offers a promising solution to overcome these challenges.\n",
    "\n",
    "TabPFN’s advantages are particularly well-suited for various HAR use cases. In healthcare, it aids in fall detection for the elderly, chronic disease monitoring, providing timely interventions. For fitness and wellness, it can classify activities such as walking or running in real-time, enhancing user experience in mobile apps and wearable devices. It enhances workplace safety by identifying risky workers' activities in hazardous industrial environments, such as in mining and on oil rigs, ensuring safety and reducing accidents. Furthermore, in the case of smart cities and urban mobility, HAR data from pedestrians and commuters can be efficiently classified to optimize traffic flow, public transport systems, and urban planning initiatives. Additionally, HAR supports emergency response efforts during disasters by locating people in need of help. TabPFN's speed, simplicity, and effectiveness make it an ideal choice for these real-time HAR applications."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Necessary imports <a class=\"anchor\" id=\"2\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: total: 0 ns\n",
      "Wall time: 1.01 ms\n"
     ]
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import pandas as pd\n",
    "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report\n",
    "\n",
    "from arcgis.gis import GIS\n",
    "from arcgis.learn import MLModel, prepare_tabulardata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to ArcGIS <a class=\"anchor\" id=\"3\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "gis = GIS(\"home\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Access the datasets <a class=\"anchor\" id=\"4\"></a>\n",
    "\n",
    "Here we will access the train and test datasets. The Human Activity Recognition (HAR) training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"item_container\" style=\"height: auto; overflow: hidden; border: 1px solid #cfcfcf; border-radius: 2px; background: #f6fafa; line-height: 1.21429em; padding: 10px;\">\n",
       "                    <div class=\"item_left\" style=\"width: 210px; float: left;\">\n",
       "                       <a href='https://geosaurus.maps.arcgis.com/home/item.html?id=1fafacc88bc3491696f981758a72de50' target='_blank'>\n",
       "                        <img src='' width='200' height='133' class=\"itemThumbnail\">\n",
       "                       </a>\n",
       "                    </div>\n",
       "\n",
       "                    <div class=\"item_right\"     style=\"float: none; width: auto; overflow: hidden;\">\n",
       "                        <a href='https://geosaurus.maps.arcgis.com/home/item.html?id=1fafacc88bc3491696f981758a72de50' target='_blank'><b>train_har_dataset</b>\n",
       "                        </a>\n",
       "                        <br/>HAR dataset<br/><img src='https://geosaurus.maps.arcgis.com/home/js/arcgisonline/img/item-types/datafiles16.svg' style=\"vertical-align:middle;\" width=16 height=16>CSV by api_data_owner\n",
       "                        <br/>Last Modified: January 10, 2025\n",
       "                        <br/>0 comments, 3 views\n",
       "                    </div>\n",
       "                </div>\n",
       "                "
      ],
      "text/plain": [
       "<Item title:\"train_har_dataset\" type:CSV owner:api_data_owner>"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# access the training data\n",
    "data_table = gis.content.get('1fafacc88bc3491696f981758a72de50')\n",
    "data_table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the train data and saving it in local folder\n",
    "data_path = data_table.get_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tBodyAcc-mean()-X</th>\n",
       "      <th>tBodyAcc-mean()-Y</th>\n",
       "      <th>tBodyAcc-mean()-Z</th>\n",
       "      <th>tBodyAcc-std()-X</th>\n",
       "      <th>tBodyAcc-std()-Y</th>\n",
       "      <th>tBodyAcc-std()-Z</th>\n",
       "      <th>tBodyAcc-mad()-X</th>\n",
       "      <th>tBodyAcc-mad()-Y</th>\n",
       "      <th>tBodyAcc-mad()-Z</th>\n",
       "      <th>tBodyAcc-max()-X</th>\n",
       "      <th>...</th>\n",
       "      <th>fBodyBodyGyroJerkMag-kurtosis()</th>\n",
       "      <th>angle(tBodyAccMean,gravity)</th>\n",
       "      <th>angle(tBodyAccJerkMean),gravityMean)</th>\n",
       "      <th>angle(tBodyGyroMean,gravityMean)</th>\n",
       "      <th>angle(tBodyGyroJerkMean,gravityMean)</th>\n",
       "      <th>angle(X,gravityMean)</th>\n",
       "      <th>angle(Y,gravityMean)</th>\n",
       "      <th>angle(Z,gravityMean)</th>\n",
       "      <th>subject</th>\n",
       "      <th>Activity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.271144</td>\n",
       "      <td>-0.033031</td>\n",
       "      <td>-0.121829</td>\n",
       "      <td>-0.987884</td>\n",
       "      <td>-0.867081</td>\n",
       "      <td>-0.945087</td>\n",
       "      <td>-0.991858</td>\n",
       "      <td>-0.906651</td>\n",
       "      <td>-0.943951</td>\n",
       "      <td>-0.909793</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.685932</td>\n",
       "      <td>0.007629</td>\n",
       "      <td>0.068842</td>\n",
       "      <td>-0.762768</td>\n",
       "      <td>-0.751408</td>\n",
       "      <td>-0.786647</td>\n",
       "      <td>0.234417</td>\n",
       "      <td>-0.040345</td>\n",
       "      <td>22</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.278211</td>\n",
       "      <td>-0.020855</td>\n",
       "      <td>-0.103400</td>\n",
       "      <td>-0.996593</td>\n",
       "      <td>-0.980402</td>\n",
       "      <td>-0.988998</td>\n",
       "      <td>-0.997065</td>\n",
       "      <td>-0.977596</td>\n",
       "      <td>-0.988861</td>\n",
       "      <td>-0.941245</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.771901</td>\n",
       "      <td>0.001727</td>\n",
       "      <td>0.295551</td>\n",
       "      <td>-0.035877</td>\n",
       "      <td>-0.360496</td>\n",
       "      <td>-0.661464</td>\n",
       "      <td>0.221240</td>\n",
       "      <td>0.223323</td>\n",
       "      <td>17</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.276012</td>\n",
       "      <td>-0.015713</td>\n",
       "      <td>-0.103117</td>\n",
       "      <td>-0.982340</td>\n",
       "      <td>-0.834824</td>\n",
       "      <td>-0.973649</td>\n",
       "      <td>-0.986465</td>\n",
       "      <td>-0.862017</td>\n",
       "      <td>-0.976193</td>\n",
       "      <td>-0.914101</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.206414</td>\n",
       "      <td>0.127391</td>\n",
       "      <td>0.028581</td>\n",
       "      <td>0.053358</td>\n",
       "      <td>0.637500</td>\n",
       "      <td>-0.826721</td>\n",
       "      <td>0.212775</td>\n",
       "      <td>-0.018280</td>\n",
       "      <td>25</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.272753</td>\n",
       "      <td>-0.016910</td>\n",
       "      <td>-0.101737</td>\n",
       "      <td>-0.997409</td>\n",
       "      <td>-0.996203</td>\n",
       "      <td>-0.983416</td>\n",
       "      <td>-0.997425</td>\n",
       "      <td>-0.996439</td>\n",
       "      <td>-0.984400</td>\n",
       "      <td>-0.944557</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.894735</td>\n",
       "      <td>0.096469</td>\n",
       "      <td>0.319319</td>\n",
       "      <td>0.229398</td>\n",
       "      <td>0.267721</td>\n",
       "      <td>-0.672092</td>\n",
       "      <td>0.195500</td>\n",
       "      <td>-0.194527</td>\n",
       "      <td>16</td>\n",
       "      <td>SITTING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.275565</td>\n",
       "      <td>-0.014967</td>\n",
       "      <td>-0.107715</td>\n",
       "      <td>-0.995365</td>\n",
       "      <td>-0.988601</td>\n",
       "      <td>-0.988218</td>\n",
       "      <td>-0.995880</td>\n",
       "      <td>-0.989519</td>\n",
       "      <td>-0.986747</td>\n",
       "      <td>-0.937645</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.888258</td>\n",
       "      <td>0.152336</td>\n",
       "      <td>0.217308</td>\n",
       "      <td>-0.377648</td>\n",
       "      <td>0.733588</td>\n",
       "      <td>-0.749599</td>\n",
       "      <td>0.048129</td>\n",
       "      <td>-0.156654</td>\n",
       "      <td>11</td>\n",
       "      <td>SITTING</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 563 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   tBodyAcc-mean()-X  tBodyAcc-mean()-Y  tBodyAcc-mean()-Z  tBodyAcc-std()-X  \\\n",
       "0           0.271144          -0.033031          -0.121829         -0.987884   \n",
       "1           0.278211          -0.020855          -0.103400         -0.996593   \n",
       "2           0.276012          -0.015713          -0.103117         -0.982340   \n",
       "3           0.272753          -0.016910          -0.101737         -0.997409   \n",
       "4           0.275565          -0.014967          -0.107715         -0.995365   \n",
       "\n",
       "   tBodyAcc-std()-Y  tBodyAcc-std()-Z  tBodyAcc-mad()-X  tBodyAcc-mad()-Y  \\\n",
       "0         -0.867081         -0.945087         -0.991858         -0.906651   \n",
       "1         -0.980402         -0.988998         -0.997065         -0.977596   \n",
       "2         -0.834824         -0.973649         -0.986465         -0.862017   \n",
       "3         -0.996203         -0.983416         -0.997425         -0.996439   \n",
       "4         -0.988601         -0.988218         -0.995880         -0.989519   \n",
       "\n",
       "   tBodyAcc-mad()-Z  tBodyAcc-max()-X  ...  fBodyBodyGyroJerkMag-kurtosis()  \\\n",
       "0         -0.943951         -0.909793  ...                        -0.685932   \n",
       "1         -0.988861         -0.941245  ...                        -0.771901   \n",
       "2         -0.976193         -0.914101  ...                        -0.206414   \n",
       "3         -0.984400         -0.944557  ...                        -0.894735   \n",
       "4         -0.986747         -0.937645  ...                        -0.888258   \n",
       "\n",
       "   angle(tBodyAccMean,gravity)  angle(tBodyAccJerkMean),gravityMean)  \\\n",
       "0                     0.007629                              0.068842   \n",
       "1                     0.001727                              0.295551   \n",
       "2                     0.127391                              0.028581   \n",
       "3                     0.096469                              0.319319   \n",
       "4                     0.152336                              0.217308   \n",
       "\n",
       "   angle(tBodyGyroMean,gravityMean)  angle(tBodyGyroJerkMean,gravityMean)  \\\n",
       "0                         -0.762768                             -0.751408   \n",
       "1                         -0.035877                             -0.360496   \n",
       "2                          0.053358                              0.637500   \n",
       "3                          0.229398                              0.267721   \n",
       "4                         -0.377648                              0.733588   \n",
       "\n",
       "   angle(X,gravityMean)  angle(Y,gravityMean)  angle(Z,gravityMean)  subject  \\\n",
       "0             -0.786647              0.234417             -0.040345       22   \n",
       "1             -0.661464              0.221240              0.223323       17   \n",
       "2             -0.826721              0.212775             -0.018280       25   \n",
       "3             -0.672092              0.195500             -0.194527       16   \n",
       "4             -0.749599              0.048129             -0.156654       11   \n",
       "\n",
       "   Activity  \n",
       "0  STANDING  \n",
       "1  STANDING  \n",
       "2  STANDING  \n",
       "3   SITTING  \n",
       "4   SITTING  \n",
       "\n",
       "[5 rows x 563 columns]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read the downloaded data\n",
    "train_har_data = pd.read_csv(data_path)\n",
    "train_har_data.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1020, 563)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_har_data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will access the test dataset, which is significantly larger, containing 6,332 samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"item_container\" style=\"height: auto; overflow: hidden; border: 1px solid #cfcfcf; border-radius: 2px; background: #f6fafa; line-height: 1.21429em; padding: 10px;\">\n",
       "                    <div class=\"item_left\" style=\"width: 210px; float: left;\">\n",
       "                       <a href='https://geosaurus.maps.arcgis.com/home/item.html?id=e65312babe5b4efbaa2842235b79f653' target='_blank'>\n",
       "                        <img src='' width='200' height='133' class=\"itemThumbnail\">\n",
       "                       </a>\n",
       "                    </div>\n",
       "\n",
       "                    <div class=\"item_right\"     style=\"float: none; width: auto; overflow: hidden;\">\n",
       "                        <a href='https://geosaurus.maps.arcgis.com/home/item.html?id=e65312babe5b4efbaa2842235b79f653' target='_blank'><b>test_har_dataset</b>\n",
       "                        </a>\n",
       "                        <br/>HAR dataset<br/><img src='https://geosaurus.maps.arcgis.com/home/js/arcgisonline/img/item-types/datafiles16.svg' style=\"vertical-align:middle;\" width=16 height=16>CSV by api_data_owner\n",
       "                        <br/>Last Modified: January 10, 2025\n",
       "                        <br/>0 comments, 0 views\n",
       "                    </div>\n",
       "                </div>\n",
       "                "
      ],
      "text/plain": [
       "<Item title:\"test_har_dataset\" type:CSV owner:api_data_owner>"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# access the test data\n",
    "test_data_table = gis.content.get('e65312babe5b4efbaa2842235b79f653')\n",
    "test_data_table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the test data and save it to a local folder\n",
    "test_data_path = test_data_table.get_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tBodyAcc-mean()-X</th>\n",
       "      <th>tBodyAcc-mean()-Y</th>\n",
       "      <th>tBodyAcc-mean()-Z</th>\n",
       "      <th>tBodyAcc-std()-X</th>\n",
       "      <th>tBodyAcc-std()-Y</th>\n",
       "      <th>tBodyAcc-std()-Z</th>\n",
       "      <th>tBodyAcc-mad()-X</th>\n",
       "      <th>tBodyAcc-mad()-Y</th>\n",
       "      <th>tBodyAcc-mad()-Z</th>\n",
       "      <th>tBodyAcc-max()-X</th>\n",
       "      <th>...</th>\n",
       "      <th>fBodyBodyGyroJerkMag-kurtosis()</th>\n",
       "      <th>angle(tBodyAccMean,gravity)</th>\n",
       "      <th>angle(tBodyAccJerkMean),gravityMean)</th>\n",
       "      <th>angle(tBodyGyroMean,gravityMean)</th>\n",
       "      <th>angle(tBodyGyroJerkMean,gravityMean)</th>\n",
       "      <th>angle(X,gravityMean)</th>\n",
       "      <th>angle(Y,gravityMean)</th>\n",
       "      <th>angle(Z,gravityMean)</th>\n",
       "      <th>subject</th>\n",
       "      <th>Activity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.288585</td>\n",
       "      <td>-0.020294</td>\n",
       "      <td>-0.132905</td>\n",
       "      <td>-0.995279</td>\n",
       "      <td>-0.983111</td>\n",
       "      <td>-0.913526</td>\n",
       "      <td>-0.995112</td>\n",
       "      <td>-0.983185</td>\n",
       "      <td>-0.923527</td>\n",
       "      <td>-0.934724</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.710304</td>\n",
       "      <td>-0.112754</td>\n",
       "      <td>0.030400</td>\n",
       "      <td>-0.464761</td>\n",
       "      <td>-0.018446</td>\n",
       "      <td>-0.841247</td>\n",
       "      <td>0.179941</td>\n",
       "      <td>-0.058627</td>\n",
       "      <td>1</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.278419</td>\n",
       "      <td>-0.016411</td>\n",
       "      <td>-0.123520</td>\n",
       "      <td>-0.998245</td>\n",
       "      <td>-0.975300</td>\n",
       "      <td>-0.960322</td>\n",
       "      <td>-0.998807</td>\n",
       "      <td>-0.974914</td>\n",
       "      <td>-0.957686</td>\n",
       "      <td>-0.943068</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.861499</td>\n",
       "      <td>0.053477</td>\n",
       "      <td>-0.007435</td>\n",
       "      <td>-0.732626</td>\n",
       "      <td>0.703511</td>\n",
       "      <td>-0.844788</td>\n",
       "      <td>0.180289</td>\n",
       "      <td>-0.054317</td>\n",
       "      <td>1</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.276629</td>\n",
       "      <td>-0.016570</td>\n",
       "      <td>-0.115362</td>\n",
       "      <td>-0.998139</td>\n",
       "      <td>-0.980817</td>\n",
       "      <td>-0.990482</td>\n",
       "      <td>-0.998321</td>\n",
       "      <td>-0.979672</td>\n",
       "      <td>-0.990441</td>\n",
       "      <td>-0.942469</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.699205</td>\n",
       "      <td>0.123320</td>\n",
       "      <td>0.122542</td>\n",
       "      <td>0.693578</td>\n",
       "      <td>-0.615971</td>\n",
       "      <td>-0.847865</td>\n",
       "      <td>0.185151</td>\n",
       "      <td>-0.043892</td>\n",
       "      <td>1</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.277293</td>\n",
       "      <td>-0.021751</td>\n",
       "      <td>-0.120751</td>\n",
       "      <td>-0.997328</td>\n",
       "      <td>-0.961245</td>\n",
       "      <td>-0.983672</td>\n",
       "      <td>-0.997596</td>\n",
       "      <td>-0.957236</td>\n",
       "      <td>-0.984379</td>\n",
       "      <td>-0.940598</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.572995</td>\n",
       "      <td>0.012954</td>\n",
       "      <td>0.080936</td>\n",
       "      <td>-0.234313</td>\n",
       "      <td>0.117797</td>\n",
       "      <td>-0.847971</td>\n",
       "      <td>0.188982</td>\n",
       "      <td>-0.037364</td>\n",
       "      <td>1</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.277175</td>\n",
       "      <td>-0.014713</td>\n",
       "      <td>-0.106756</td>\n",
       "      <td>-0.999188</td>\n",
       "      <td>-0.990526</td>\n",
       "      <td>-0.993365</td>\n",
       "      <td>-0.999211</td>\n",
       "      <td>-0.990687</td>\n",
       "      <td>-0.992168</td>\n",
       "      <td>-0.943323</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.765901</td>\n",
       "      <td>0.105620</td>\n",
       "      <td>-0.090278</td>\n",
       "      <td>-0.132403</td>\n",
       "      <td>0.498814</td>\n",
       "      <td>-0.849773</td>\n",
       "      <td>0.188812</td>\n",
       "      <td>-0.035063</td>\n",
       "      <td>1</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 563 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   tBodyAcc-mean()-X  tBodyAcc-mean()-Y  tBodyAcc-mean()-Z  tBodyAcc-std()-X  \\\n",
       "0           0.288585          -0.020294          -0.132905         -0.995279   \n",
       "1           0.278419          -0.016411          -0.123520         -0.998245   \n",
       "2           0.276629          -0.016570          -0.115362         -0.998139   \n",
       "3           0.277293          -0.021751          -0.120751         -0.997328   \n",
       "4           0.277175          -0.014713          -0.106756         -0.999188   \n",
       "\n",
       "   tBodyAcc-std()-Y  tBodyAcc-std()-Z  tBodyAcc-mad()-X  tBodyAcc-mad()-Y  \\\n",
       "0         -0.983111         -0.913526         -0.995112         -0.983185   \n",
       "1         -0.975300         -0.960322         -0.998807         -0.974914   \n",
       "2         -0.980817         -0.990482         -0.998321         -0.979672   \n",
       "3         -0.961245         -0.983672         -0.997596         -0.957236   \n",
       "4         -0.990526         -0.993365         -0.999211         -0.990687   \n",
       "\n",
       "   tBodyAcc-mad()-Z  tBodyAcc-max()-X  ...  fBodyBodyGyroJerkMag-kurtosis()  \\\n",
       "0         -0.923527         -0.934724  ...                        -0.710304   \n",
       "1         -0.957686         -0.943068  ...                        -0.861499   \n",
       "2         -0.990441         -0.942469  ...                        -0.699205   \n",
       "3         -0.984379         -0.940598  ...                        -0.572995   \n",
       "4         -0.992168         -0.943323  ...                        -0.765901   \n",
       "\n",
       "   angle(tBodyAccMean,gravity)  angle(tBodyAccJerkMean),gravityMean)  \\\n",
       "0                    -0.112754                              0.030400   \n",
       "1                     0.053477                             -0.007435   \n",
       "2                     0.123320                              0.122542   \n",
       "3                     0.012954                              0.080936   \n",
       "4                     0.105620                             -0.090278   \n",
       "\n",
       "   angle(tBodyGyroMean,gravityMean)  angle(tBodyGyroJerkMean,gravityMean)  \\\n",
       "0                         -0.464761                             -0.018446   \n",
       "1                         -0.732626                              0.703511   \n",
       "2                          0.693578                             -0.615971   \n",
       "3                         -0.234313                              0.117797   \n",
       "4                         -0.132403                              0.498814   \n",
       "\n",
       "   angle(X,gravityMean)  angle(Y,gravityMean)  angle(Z,gravityMean)  subject  \\\n",
       "0             -0.841247              0.179941             -0.058627        1   \n",
       "1             -0.844788              0.180289             -0.054317        1   \n",
       "2             -0.847865              0.185151             -0.043892        1   \n",
       "3             -0.847971              0.188982             -0.037364        1   \n",
       "4             -0.849773              0.188812             -0.035063        1   \n",
       "\n",
       "   Activity  \n",
       "0  STANDING  \n",
       "1  STANDING  \n",
       "2  STANDING  \n",
       "3  STANDING  \n",
       "4  STANDING  \n",
       "\n",
       "[5 rows x 563 columns]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read the test data\n",
    "test_har_data = pd.read_csv(test_data_path).drop([\"Unnamed: 0\"], axis=1)\n",
    "test_har_data.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6332, 563)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_har_data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare training data for TabPFN <a class=\"anchor\" id=\"5\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "# View column names except the columns - 'subject','Activity'\n",
    "ls = list(train_har_data.columns)\n",
    "X = [item for item in ls if item not in ['subject','Activity']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['tBodyAcc-mean()-X',\n",
       " 'tBodyAcc-mean()-Y',\n",
       " 'tBodyAcc-mean()-Z',\n",
       " 'tBodyAcc-std()-X',\n",
       " 'tBodyAcc-std()-Y',\n",
       " 'tBodyAcc-std()-Z',\n",
       " 'tBodyAcc-mad()-X',\n",
       " 'tBodyAcc-mad()-Y',\n",
       " 'tBodyAcc-mad()-Z',\n",
       " 'tBodyAcc-max()-X',\n",
       " 'tBodyAcc-max()-Y',\n",
       " 'tBodyAcc-max()-Z',\n",
       " 'tBodyAcc-min()-X',\n",
       " 'tBodyAcc-min()-Y',\n",
       " 'tBodyAcc-min()-Z',\n",
       " 'tBodyAcc-sma()',\n",
       " 'tBodyAcc-energy()-X',\n",
       " 'tBodyAcc-energy()-Y',\n",
       " 'tBodyAcc-energy()-Z',\n",
       " 'tBodyAcc-iqr()-X',\n",
       " 'tBodyAcc-iqr()-Y',\n",
       " 'tBodyAcc-iqr()-Z',\n",
       " 'tBodyAcc-entropy()-X',\n",
       " 'tBodyAcc-entropy()-Y',\n",
       " 'tBodyAcc-entropy()-Z',\n",
       " 'tBodyAcc-arCoeff()-X,1',\n",
       " 'tBodyAcc-arCoeff()-X,2',\n",
       " 'tBodyAcc-arCoeff()-X,3',\n",
       " 'tBodyAcc-arCoeff()-X,4',\n",
       " 'tBodyAcc-arCoeff()-Y,1',\n",
       " 'tBodyAcc-arCoeff()-Y,2',\n",
       " 'tBodyAcc-arCoeff()-Y,3',\n",
       " 'tBodyAcc-arCoeff()-Y,4',\n",
       " 'tBodyAcc-arCoeff()-Z,1',\n",
       " 'tBodyAcc-arCoeff()-Z,2',\n",
       " 'tBodyAcc-arCoeff()-Z,3',\n",
       " 'tBodyAcc-arCoeff()-Z,4',\n",
       " 'tBodyAcc-correlation()-X,Y',\n",
       " 'tBodyAcc-correlation()-X,Z',\n",
       " 'tBodyAcc-correlation()-Y,Z',\n",
       " 'tGravityAcc-mean()-X',\n",
       " 'tGravityAcc-mean()-Y',\n",
       " 'tGravityAcc-mean()-Z',\n",
       " 'tGravityAcc-std()-X',\n",
       " 'tGravityAcc-std()-Y',\n",
       " 'tGravityAcc-std()-Z',\n",
       " 'tGravityAcc-mad()-X',\n",
       " 'tGravityAcc-mad()-Y',\n",
       " 'tGravityAcc-mad()-Z',\n",
       " 'tGravityAcc-max()-X',\n",
       " 'tGravityAcc-max()-Y',\n",
       " 'tGravityAcc-max()-Z',\n",
       " 'tGravityAcc-min()-X',\n",
       " 'tGravityAcc-min()-Y',\n",
       " 'tGravityAcc-min()-Z',\n",
       " 'tGravityAcc-sma()',\n",
       " 'tGravityAcc-energy()-X',\n",
       " 'tGravityAcc-energy()-Y',\n",
       " 'tGravityAcc-energy()-Z',\n",
       " 'tGravityAcc-iqr()-X',\n",
       " 'tGravityAcc-iqr()-Y',\n",
       " 'tGravityAcc-iqr()-Z',\n",
       " 'tGravityAcc-entropy()-X',\n",
       " 'tGravityAcc-entropy()-Y',\n",
       " 'tGravityAcc-entropy()-Z',\n",
       " 'tGravityAcc-arCoeff()-X,1',\n",
       " 'tGravityAcc-arCoeff()-X,2',\n",
       " 'tGravityAcc-arCoeff()-X,3',\n",
       " 'tGravityAcc-arCoeff()-X,4',\n",
       " 'tGravityAcc-arCoeff()-Y,1',\n",
       " 'tGravityAcc-arCoeff()-Y,2',\n",
       " 'tGravityAcc-arCoeff()-Y,3',\n",
       " 'tGravityAcc-arCoeff()-Y,4',\n",
       " 'tGravityAcc-arCoeff()-Z,1',\n",
       " 'tGravityAcc-arCoeff()-Z,2',\n",
       " 'tGravityAcc-arCoeff()-Z,3',\n",
       " 'tGravityAcc-arCoeff()-Z,4',\n",
       " 'tGravityAcc-correlation()-X,Y',\n",
       " 'tGravityAcc-correlation()-X,Z',\n",
       " 'tGravityAcc-correlation()-Y,Z',\n",
       " 'tBodyAccJerk-mean()-X',\n",
       " 'tBodyAccJerk-mean()-Y',\n",
       " 'tBodyAccJerk-mean()-Z',\n",
       " 'tBodyAccJerk-std()-X',\n",
       " 'tBodyAccJerk-std()-Y',\n",
       " 'tBodyAccJerk-std()-Z',\n",
       " 'tBodyAccJerk-mad()-X',\n",
       " 'tBodyAccJerk-mad()-Y',\n",
       " 'tBodyAccJerk-mad()-Z',\n",
       " 'tBodyAccJerk-max()-X',\n",
       " 'tBodyAccJerk-max()-Y',\n",
       " 'tBodyAccJerk-max()-Z',\n",
       " 'tBodyAccJerk-min()-X',\n",
       " 'tBodyAccJerk-min()-Y',\n",
       " 'tBodyAccJerk-min()-Z',\n",
       " 'tBodyAccJerk-sma()',\n",
       " 'tBodyAccJerk-energy()-X',\n",
       " 'tBodyAccJerk-energy()-Y',\n",
       " 'tBodyAccJerk-energy()-Z',\n",
       " 'tBodyAccJerk-iqr()-X',\n",
       " 'tBodyAccJerk-iqr()-Y',\n",
       " 'tBodyAccJerk-iqr()-Z',\n",
       " 'tBodyAccJerk-entropy()-X',\n",
       " 'tBodyAccJerk-entropy()-Y',\n",
       " 'tBodyAccJerk-entropy()-Z',\n",
       " 'tBodyAccJerk-arCoeff()-X,1',\n",
       " 'tBodyAccJerk-arCoeff()-X,2',\n",
       " 'tBodyAccJerk-arCoeff()-X,3',\n",
       " 'tBodyAccJerk-arCoeff()-X,4',\n",
       " 'tBodyAccJerk-arCoeff()-Y,1',\n",
       " 'tBodyAccJerk-arCoeff()-Y,2',\n",
       " 'tBodyAccJerk-arCoeff()-Y,3',\n",
       " 'tBodyAccJerk-arCoeff()-Y,4',\n",
       " 'tBodyAccJerk-arCoeff()-Z,1',\n",
       " 'tBodyAccJerk-arCoeff()-Z,2',\n",
       " 'tBodyAccJerk-arCoeff()-Z,3',\n",
       " 'tBodyAccJerk-arCoeff()-Z,4',\n",
       " 'tBodyAccJerk-correlation()-X,Y',\n",
       " 'tBodyAccJerk-correlation()-X,Z',\n",
       " 'tBodyAccJerk-correlation()-Y,Z',\n",
       " 'tBodyGyro-mean()-X',\n",
       " 'tBodyGyro-mean()-Y',\n",
       " 'tBodyGyro-mean()-Z',\n",
       " 'tBodyGyro-std()-X',\n",
       " 'tBodyGyro-std()-Y',\n",
       " 'tBodyGyro-std()-Z',\n",
       " 'tBodyGyro-mad()-X',\n",
       " 'tBodyGyro-mad()-Y',\n",
       " 'tBodyGyro-mad()-Z',\n",
       " 'tBodyGyro-max()-X',\n",
       " 'tBodyGyro-max()-Y',\n",
       " 'tBodyGyro-max()-Z',\n",
       " 'tBodyGyro-min()-X',\n",
       " 'tBodyGyro-min()-Y',\n",
       " 'tBodyGyro-min()-Z',\n",
       " 'tBodyGyro-sma()',\n",
       " 'tBodyGyro-energy()-X',\n",
       " 'tBodyGyro-energy()-Y',\n",
       " 'tBodyGyro-energy()-Z',\n",
       " 'tBodyGyro-iqr()-X',\n",
       " 'tBodyGyro-iqr()-Y',\n",
       " 'tBodyGyro-iqr()-Z',\n",
       " 'tBodyGyro-entropy()-X',\n",
       " 'tBodyGyro-entropy()-Y',\n",
       " 'tBodyGyro-entropy()-Z',\n",
       " 'tBodyGyro-arCoeff()-X,1',\n",
       " 'tBodyGyro-arCoeff()-X,2',\n",
       " 'tBodyGyro-arCoeff()-X,3',\n",
       " 'tBodyGyro-arCoeff()-X,4',\n",
       " 'tBodyGyro-arCoeff()-Y,1',\n",
       " 'tBodyGyro-arCoeff()-Y,2',\n",
       " 'tBodyGyro-arCoeff()-Y,3',\n",
       " 'tBodyGyro-arCoeff()-Y,4',\n",
       " 'tBodyGyro-arCoeff()-Z,1',\n",
       " 'tBodyGyro-arCoeff()-Z,2',\n",
       " 'tBodyGyro-arCoeff()-Z,3',\n",
       " 'tBodyGyro-arCoeff()-Z,4',\n",
       " 'tBodyGyro-correlation()-X,Y',\n",
       " 'tBodyGyro-correlation()-X,Z',\n",
       " 'tBodyGyro-correlation()-Y,Z',\n",
       " 'tBodyGyroJerk-mean()-X',\n",
       " 'tBodyGyroJerk-mean()-Y',\n",
       " 'tBodyGyroJerk-mean()-Z',\n",
       " 'tBodyGyroJerk-std()-X',\n",
       " 'tBodyGyroJerk-std()-Y',\n",
       " 'tBodyGyroJerk-std()-Z',\n",
       " 'tBodyGyroJerk-mad()-X',\n",
       " 'tBodyGyroJerk-mad()-Y',\n",
       " 'tBodyGyroJerk-mad()-Z',\n",
       " 'tBodyGyroJerk-max()-X',\n",
       " 'tBodyGyroJerk-max()-Y',\n",
       " 'tBodyGyroJerk-max()-Z',\n",
       " 'tBodyGyroJerk-min()-X',\n",
       " 'tBodyGyroJerk-min()-Y',\n",
       " 'tBodyGyroJerk-min()-Z',\n",
       " 'tBodyGyroJerk-sma()',\n",
       " 'tBodyGyroJerk-energy()-X',\n",
       " 'tBodyGyroJerk-energy()-Y',\n",
       " 'tBodyGyroJerk-energy()-Z',\n",
       " 'tBodyGyroJerk-iqr()-X',\n",
       " 'tBodyGyroJerk-iqr()-Y',\n",
       " 'tBodyGyroJerk-iqr()-Z',\n",
       " 'tBodyGyroJerk-entropy()-X',\n",
       " 'tBodyGyroJerk-entropy()-Y',\n",
       " 'tBodyGyroJerk-entropy()-Z',\n",
       " 'tBodyGyroJerk-arCoeff()-X,1',\n",
       " 'tBodyGyroJerk-arCoeff()-X,2',\n",
       " 'tBodyGyroJerk-arCoeff()-X,3',\n",
       " 'tBodyGyroJerk-arCoeff()-X,4',\n",
       " 'tBodyGyroJerk-arCoeff()-Y,1',\n",
       " 'tBodyGyroJerk-arCoeff()-Y,2',\n",
       " 'tBodyGyroJerk-arCoeff()-Y,3',\n",
       " 'tBodyGyroJerk-arCoeff()-Y,4',\n",
       " 'tBodyGyroJerk-arCoeff()-Z,1',\n",
       " 'tBodyGyroJerk-arCoeff()-Z,2',\n",
       " 'tBodyGyroJerk-arCoeff()-Z,3',\n",
       " 'tBodyGyroJerk-arCoeff()-Z,4',\n",
       " 'tBodyGyroJerk-correlation()-X,Y',\n",
       " 'tBodyGyroJerk-correlation()-X,Z',\n",
       " 'tBodyGyroJerk-correlation()-Y,Z',\n",
       " 'tBodyAccMag-mean()',\n",
       " 'tBodyAccMag-std()',\n",
       " 'tBodyAccMag-mad()',\n",
       " 'tBodyAccMag-max()',\n",
       " 'tBodyAccMag-min()',\n",
       " 'tBodyAccMag-sma()',\n",
       " 'tBodyAccMag-energy()',\n",
       " 'tBodyAccMag-iqr()',\n",
       " 'tBodyAccMag-entropy()',\n",
       " 'tBodyAccMag-arCoeff()1',\n",
       " 'tBodyAccMag-arCoeff()2',\n",
       " 'tBodyAccMag-arCoeff()3',\n",
       " 'tBodyAccMag-arCoeff()4',\n",
       " 'tGravityAccMag-mean()',\n",
       " 'tGravityAccMag-std()',\n",
       " 'tGravityAccMag-mad()',\n",
       " 'tGravityAccMag-max()',\n",
       " 'tGravityAccMag-min()',\n",
       " 'tGravityAccMag-sma()',\n",
       " 'tGravityAccMag-energy()',\n",
       " 'tGravityAccMag-iqr()',\n",
       " 'tGravityAccMag-entropy()',\n",
       " 'tGravityAccMag-arCoeff()1',\n",
       " 'tGravityAccMag-arCoeff()2',\n",
       " 'tGravityAccMag-arCoeff()3',\n",
       " 'tGravityAccMag-arCoeff()4',\n",
       " 'tBodyAccJerkMag-mean()',\n",
       " 'tBodyAccJerkMag-std()',\n",
       " 'tBodyAccJerkMag-mad()',\n",
       " 'tBodyAccJerkMag-max()',\n",
       " 'tBodyAccJerkMag-min()',\n",
       " 'tBodyAccJerkMag-sma()',\n",
       " 'tBodyAccJerkMag-energy()',\n",
       " 'tBodyAccJerkMag-iqr()',\n",
       " 'tBodyAccJerkMag-entropy()',\n",
       " 'tBodyAccJerkMag-arCoeff()1',\n",
       " 'tBodyAccJerkMag-arCoeff()2',\n",
       " 'tBodyAccJerkMag-arCoeff()3',\n",
       " 'tBodyAccJerkMag-arCoeff()4',\n",
       " 'tBodyGyroMag-mean()',\n",
       " 'tBodyGyroMag-std()',\n",
       " 'tBodyGyroMag-mad()',\n",
       " 'tBodyGyroMag-max()',\n",
       " 'tBodyGyroMag-min()',\n",
       " 'tBodyGyroMag-sma()',\n",
       " 'tBodyGyroMag-energy()',\n",
       " 'tBodyGyroMag-iqr()',\n",
       " 'tBodyGyroMag-entropy()',\n",
       " 'tBodyGyroMag-arCoeff()1',\n",
       " 'tBodyGyroMag-arCoeff()2',\n",
       " 'tBodyGyroMag-arCoeff()3',\n",
       " 'tBodyGyroMag-arCoeff()4',\n",
       " 'tBodyGyroJerkMag-mean()',\n",
       " 'tBodyGyroJerkMag-std()',\n",
       " 'tBodyGyroJerkMag-mad()',\n",
       " 'tBodyGyroJerkMag-max()',\n",
       " 'tBodyGyroJerkMag-min()',\n",
       " 'tBodyGyroJerkMag-sma()',\n",
       " 'tBodyGyroJerkMag-energy()',\n",
       " 'tBodyGyroJerkMag-iqr()',\n",
       " 'tBodyGyroJerkMag-entropy()',\n",
       " 'tBodyGyroJerkMag-arCoeff()1',\n",
       " 'tBodyGyroJerkMag-arCoeff()2',\n",
       " 'tBodyGyroJerkMag-arCoeff()3',\n",
       " 'tBodyGyroJerkMag-arCoeff()4',\n",
       " 'fBodyAcc-mean()-X',\n",
       " 'fBodyAcc-mean()-Y',\n",
       " 'fBodyAcc-mean()-Z',\n",
       " 'fBodyAcc-std()-X',\n",
       " 'fBodyAcc-std()-Y',\n",
       " 'fBodyAcc-std()-Z',\n",
       " 'fBodyAcc-mad()-X',\n",
       " 'fBodyAcc-mad()-Y',\n",
       " 'fBodyAcc-mad()-Z',\n",
       " 'fBodyAcc-max()-X',\n",
       " 'fBodyAcc-max()-Y',\n",
       " 'fBodyAcc-max()-Z',\n",
       " 'fBodyAcc-min()-X',\n",
       " 'fBodyAcc-min()-Y',\n",
       " 'fBodyAcc-min()-Z',\n",
       " 'fBodyAcc-sma()',\n",
       " 'fBodyAcc-energy()-X',\n",
       " 'fBodyAcc-energy()-Y',\n",
       " 'fBodyAcc-energy()-Z',\n",
       " 'fBodyAcc-iqr()-X',\n",
       " 'fBodyAcc-iqr()-Y',\n",
       " 'fBodyAcc-iqr()-Z',\n",
       " 'fBodyAcc-entropy()-X',\n",
       " 'fBodyAcc-entropy()-Y',\n",
       " 'fBodyAcc-entropy()-Z',\n",
       " 'fBodyAcc-maxInds-X',\n",
       " 'fBodyAcc-maxInds-Y',\n",
       " 'fBodyAcc-maxInds-Z',\n",
       " 'fBodyAcc-meanFreq()-X',\n",
       " 'fBodyAcc-meanFreq()-Y',\n",
       " 'fBodyAcc-meanFreq()-Z',\n",
       " 'fBodyAcc-skewness()-X',\n",
       " 'fBodyAcc-kurtosis()-X',\n",
       " 'fBodyAcc-skewness()-Y',\n",
       " 'fBodyAcc-kurtosis()-Y',\n",
       " 'fBodyAcc-skewness()-Z',\n",
       " 'fBodyAcc-kurtosis()-Z',\n",
       " 'fBodyAcc-bandsEnergy()-1,8',\n",
       " 'fBodyAcc-bandsEnergy()-9,16',\n",
       " 'fBodyAcc-bandsEnergy()-17,24',\n",
       " 'fBodyAcc-bandsEnergy()-25,32',\n",
       " 'fBodyAcc-bandsEnergy()-33,40',\n",
       " 'fBodyAcc-bandsEnergy()-41,48',\n",
       " 'fBodyAcc-bandsEnergy()-49,56',\n",
       " 'fBodyAcc-bandsEnergy()-57,64',\n",
       " 'fBodyAcc-bandsEnergy()-1,16',\n",
       " 'fBodyAcc-bandsEnergy()-17,32',\n",
       " 'fBodyAcc-bandsEnergy()-33,48',\n",
       " 'fBodyAcc-bandsEnergy()-49,64',\n",
       " 'fBodyAcc-bandsEnergy()-1,24',\n",
       " 'fBodyAcc-bandsEnergy()-25,48',\n",
       " 'fBodyAcc-bandsEnergy()-1,8.1',\n",
       " 'fBodyAcc-bandsEnergy()-9,16.1',\n",
       " 'fBodyAcc-bandsEnergy()-17,24.1',\n",
       " 'fBodyAcc-bandsEnergy()-25,32.1',\n",
       " 'fBodyAcc-bandsEnergy()-33,40.1',\n",
       " 'fBodyAcc-bandsEnergy()-41,48.1',\n",
       " 'fBodyAcc-bandsEnergy()-49,56.1',\n",
       " 'fBodyAcc-bandsEnergy()-57,64.1',\n",
       " 'fBodyAcc-bandsEnergy()-1,16.1',\n",
       " 'fBodyAcc-bandsEnergy()-17,32.1',\n",
       " 'fBodyAcc-bandsEnergy()-33,48.1',\n",
       " 'fBodyAcc-bandsEnergy()-49,64.1',\n",
       " 'fBodyAcc-bandsEnergy()-1,24.1',\n",
       " 'fBodyAcc-bandsEnergy()-25,48.1',\n",
       " 'fBodyAcc-bandsEnergy()-1,8.2',\n",
       " 'fBodyAcc-bandsEnergy()-9,16.2',\n",
       " 'fBodyAcc-bandsEnergy()-17,24.2',\n",
       " 'fBodyAcc-bandsEnergy()-25,32.2',\n",
       " 'fBodyAcc-bandsEnergy()-33,40.2',\n",
       " 'fBodyAcc-bandsEnergy()-41,48.2',\n",
       " 'fBodyAcc-bandsEnergy()-49,56.2',\n",
       " 'fBodyAcc-bandsEnergy()-57,64.2',\n",
       " 'fBodyAcc-bandsEnergy()-1,16.2',\n",
       " 'fBodyAcc-bandsEnergy()-17,32.2',\n",
       " 'fBodyAcc-bandsEnergy()-33,48.2',\n",
       " 'fBodyAcc-bandsEnergy()-49,64.2',\n",
       " 'fBodyAcc-bandsEnergy()-1,24.2',\n",
       " 'fBodyAcc-bandsEnergy()-25,48.2',\n",
       " 'fBodyAccJerk-mean()-X',\n",
       " 'fBodyAccJerk-mean()-Y',\n",
       " 'fBodyAccJerk-mean()-Z',\n",
       " 'fBodyAccJerk-std()-X',\n",
       " 'fBodyAccJerk-std()-Y',\n",
       " 'fBodyAccJerk-std()-Z',\n",
       " 'fBodyAccJerk-mad()-X',\n",
       " 'fBodyAccJerk-mad()-Y',\n",
       " 'fBodyAccJerk-mad()-Z',\n",
       " 'fBodyAccJerk-max()-X',\n",
       " 'fBodyAccJerk-max()-Y',\n",
       " 'fBodyAccJerk-max()-Z',\n",
       " 'fBodyAccJerk-min()-X',\n",
       " 'fBodyAccJerk-min()-Y',\n",
       " 'fBodyAccJerk-min()-Z',\n",
       " 'fBodyAccJerk-sma()',\n",
       " 'fBodyAccJerk-energy()-X',\n",
       " 'fBodyAccJerk-energy()-Y',\n",
       " 'fBodyAccJerk-energy()-Z',\n",
       " 'fBodyAccJerk-iqr()-X',\n",
       " 'fBodyAccJerk-iqr()-Y',\n",
       " 'fBodyAccJerk-iqr()-Z',\n",
       " 'fBodyAccJerk-entropy()-X',\n",
       " 'fBodyAccJerk-entropy()-Y',\n",
       " 'fBodyAccJerk-entropy()-Z',\n",
       " 'fBodyAccJerk-maxInds-X',\n",
       " 'fBodyAccJerk-maxInds-Y',\n",
       " 'fBodyAccJerk-maxInds-Z',\n",
       " 'fBodyAccJerk-meanFreq()-X',\n",
       " 'fBodyAccJerk-meanFreq()-Y',\n",
       " 'fBodyAccJerk-meanFreq()-Z',\n",
       " 'fBodyAccJerk-skewness()-X',\n",
       " 'fBodyAccJerk-kurtosis()-X',\n",
       " 'fBodyAccJerk-skewness()-Y',\n",
       " 'fBodyAccJerk-kurtosis()-Y',\n",
       " 'fBodyAccJerk-skewness()-Z',\n",
       " 'fBodyAccJerk-kurtosis()-Z',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,8',\n",
       " 'fBodyAccJerk-bandsEnergy()-9,16',\n",
       " 'fBodyAccJerk-bandsEnergy()-17,24',\n",
       " 'fBodyAccJerk-bandsEnergy()-25,32',\n",
       " 'fBodyAccJerk-bandsEnergy()-33,40',\n",
       " 'fBodyAccJerk-bandsEnergy()-41,48',\n",
       " 'fBodyAccJerk-bandsEnergy()-49,56',\n",
       " 'fBodyAccJerk-bandsEnergy()-57,64',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,16',\n",
       " 'fBodyAccJerk-bandsEnergy()-17,32',\n",
       " 'fBodyAccJerk-bandsEnergy()-33,48',\n",
       " 'fBodyAccJerk-bandsEnergy()-49,64',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,24',\n",
       " 'fBodyAccJerk-bandsEnergy()-25,48',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,8.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-9,16.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-17,24.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-25,32.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-33,40.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-41,48.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-49,56.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-57,64.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,16.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-17,32.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-33,48.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-49,64.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,24.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-25,48.1',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,8.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-9,16.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-17,24.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-25,32.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-33,40.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-41,48.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-49,56.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-57,64.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,16.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-17,32.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-33,48.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-49,64.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-1,24.2',\n",
       " 'fBodyAccJerk-bandsEnergy()-25,48.2',\n",
       " 'fBodyGyro-mean()-X',\n",
       " 'fBodyGyro-mean()-Y',\n",
       " 'fBodyGyro-mean()-Z',\n",
       " 'fBodyGyro-std()-X',\n",
       " 'fBodyGyro-std()-Y',\n",
       " 'fBodyGyro-std()-Z',\n",
       " 'fBodyGyro-mad()-X',\n",
       " 'fBodyGyro-mad()-Y',\n",
       " 'fBodyGyro-mad()-Z',\n",
       " 'fBodyGyro-max()-X',\n",
       " 'fBodyGyro-max()-Y',\n",
       " 'fBodyGyro-max()-Z',\n",
       " 'fBodyGyro-min()-X',\n",
       " 'fBodyGyro-min()-Y',\n",
       " 'fBodyGyro-min()-Z',\n",
       " 'fBodyGyro-sma()',\n",
       " 'fBodyGyro-energy()-X',\n",
       " 'fBodyGyro-energy()-Y',\n",
       " 'fBodyGyro-energy()-Z',\n",
       " 'fBodyGyro-iqr()-X',\n",
       " 'fBodyGyro-iqr()-Y',\n",
       " 'fBodyGyro-iqr()-Z',\n",
       " 'fBodyGyro-entropy()-X',\n",
       " 'fBodyGyro-entropy()-Y',\n",
       " 'fBodyGyro-entropy()-Z',\n",
       " 'fBodyGyro-maxInds-X',\n",
       " 'fBodyGyro-maxInds-Y',\n",
       " 'fBodyGyro-maxInds-Z',\n",
       " 'fBodyGyro-meanFreq()-X',\n",
       " 'fBodyGyro-meanFreq()-Y',\n",
       " 'fBodyGyro-meanFreq()-Z',\n",
       " 'fBodyGyro-skewness()-X',\n",
       " 'fBodyGyro-kurtosis()-X',\n",
       " 'fBodyGyro-skewness()-Y',\n",
       " 'fBodyGyro-kurtosis()-Y',\n",
       " 'fBodyGyro-skewness()-Z',\n",
       " 'fBodyGyro-kurtosis()-Z',\n",
       " 'fBodyGyro-bandsEnergy()-1,8',\n",
       " 'fBodyGyro-bandsEnergy()-9,16',\n",
       " 'fBodyGyro-bandsEnergy()-17,24',\n",
       " 'fBodyGyro-bandsEnergy()-25,32',\n",
       " 'fBodyGyro-bandsEnergy()-33,40',\n",
       " 'fBodyGyro-bandsEnergy()-41,48',\n",
       " 'fBodyGyro-bandsEnergy()-49,56',\n",
       " 'fBodyGyro-bandsEnergy()-57,64',\n",
       " 'fBodyGyro-bandsEnergy()-1,16',\n",
       " 'fBodyGyro-bandsEnergy()-17,32',\n",
       " 'fBodyGyro-bandsEnergy()-33,48',\n",
       " 'fBodyGyro-bandsEnergy()-49,64',\n",
       " 'fBodyGyro-bandsEnergy()-1,24',\n",
       " 'fBodyGyro-bandsEnergy()-25,48',\n",
       " 'fBodyGyro-bandsEnergy()-1,8.1',\n",
       " 'fBodyGyro-bandsEnergy()-9,16.1',\n",
       " 'fBodyGyro-bandsEnergy()-17,24.1',\n",
       " 'fBodyGyro-bandsEnergy()-25,32.1',\n",
       " 'fBodyGyro-bandsEnergy()-33,40.1',\n",
       " 'fBodyGyro-bandsEnergy()-41,48.1',\n",
       " 'fBodyGyro-bandsEnergy()-49,56.1',\n",
       " 'fBodyGyro-bandsEnergy()-57,64.1',\n",
       " 'fBodyGyro-bandsEnergy()-1,16.1',\n",
       " 'fBodyGyro-bandsEnergy()-17,32.1',\n",
       " 'fBodyGyro-bandsEnergy()-33,48.1',\n",
       " 'fBodyGyro-bandsEnergy()-49,64.1',\n",
       " 'fBodyGyro-bandsEnergy()-1,24.1',\n",
       " 'fBodyGyro-bandsEnergy()-25,48.1',\n",
       " 'fBodyGyro-bandsEnergy()-1,8.2',\n",
       " 'fBodyGyro-bandsEnergy()-9,16.2',\n",
       " 'fBodyGyro-bandsEnergy()-17,24.2',\n",
       " 'fBodyGyro-bandsEnergy()-25,32.2',\n",
       " 'fBodyGyro-bandsEnergy()-33,40.2',\n",
       " 'fBodyGyro-bandsEnergy()-41,48.2',\n",
       " 'fBodyGyro-bandsEnergy()-49,56.2',\n",
       " 'fBodyGyro-bandsEnergy()-57,64.2',\n",
       " 'fBodyGyro-bandsEnergy()-1,16.2',\n",
       " 'fBodyGyro-bandsEnergy()-17,32.2',\n",
       " 'fBodyGyro-bandsEnergy()-33,48.2',\n",
       " 'fBodyGyro-bandsEnergy()-49,64.2',\n",
       " 'fBodyGyro-bandsEnergy()-1,24.2',\n",
       " 'fBodyGyro-bandsEnergy()-25,48.2',\n",
       " 'fBodyAccMag-mean()',\n",
       " 'fBodyAccMag-std()',\n",
       " 'fBodyAccMag-mad()',\n",
       " 'fBodyAccMag-max()',\n",
       " 'fBodyAccMag-min()',\n",
       " 'fBodyAccMag-sma()',\n",
       " 'fBodyAccMag-energy()',\n",
       " 'fBodyAccMag-iqr()',\n",
       " 'fBodyAccMag-entropy()',\n",
       " 'fBodyAccMag-maxInds',\n",
       " 'fBodyAccMag-meanFreq()',\n",
       " 'fBodyAccMag-skewness()',\n",
       " 'fBodyAccMag-kurtosis()',\n",
       " 'fBodyBodyAccJerkMag-mean()',\n",
       " 'fBodyBodyAccJerkMag-std()',\n",
       " 'fBodyBodyAccJerkMag-mad()',\n",
       " 'fBodyBodyAccJerkMag-max()',\n",
       " 'fBodyBodyAccJerkMag-min()',\n",
       " 'fBodyBodyAccJerkMag-sma()',\n",
       " 'fBodyBodyAccJerkMag-energy()',\n",
       " 'fBodyBodyAccJerkMag-iqr()',\n",
       " 'fBodyBodyAccJerkMag-entropy()',\n",
       " 'fBodyBodyAccJerkMag-maxInds',\n",
       " 'fBodyBodyAccJerkMag-meanFreq()',\n",
       " 'fBodyBodyAccJerkMag-skewness()',\n",
       " 'fBodyBodyAccJerkMag-kurtosis()',\n",
       " 'fBodyBodyGyroMag-mean()',\n",
       " 'fBodyBodyGyroMag-std()',\n",
       " 'fBodyBodyGyroMag-mad()',\n",
       " 'fBodyBodyGyroMag-max()',\n",
       " 'fBodyBodyGyroMag-min()',\n",
       " 'fBodyBodyGyroMag-sma()',\n",
       " 'fBodyBodyGyroMag-energy()',\n",
       " 'fBodyBodyGyroMag-iqr()',\n",
       " 'fBodyBodyGyroMag-entropy()',\n",
       " 'fBodyBodyGyroMag-maxInds',\n",
       " 'fBodyBodyGyroMag-meanFreq()',\n",
       " 'fBodyBodyGyroMag-skewness()',\n",
       " 'fBodyBodyGyroMag-kurtosis()',\n",
       " 'fBodyBodyGyroJerkMag-mean()',\n",
       " 'fBodyBodyGyroJerkMag-std()',\n",
       " 'fBodyBodyGyroJerkMag-mad()',\n",
       " 'fBodyBodyGyroJerkMag-max()',\n",
       " 'fBodyBodyGyroJerkMag-min()',\n",
       " 'fBodyBodyGyroJerkMag-sma()',\n",
       " 'fBodyBodyGyroJerkMag-energy()',\n",
       " 'fBodyBodyGyroJerkMag-iqr()',\n",
       " 'fBodyBodyGyroJerkMag-entropy()',\n",
       " 'fBodyBodyGyroJerkMag-maxInds',\n",
       " 'fBodyBodyGyroJerkMag-meanFreq()',\n",
       " 'fBodyBodyGyroJerkMag-skewness()',\n",
       " 'fBodyBodyGyroJerkMag-kurtosis()',\n",
       " 'angle(tBodyAccMean,gravity)',\n",
       " 'angle(tBodyAccJerkMean),gravityMean)',\n",
       " 'angle(tBodyGyroMean,gravityMean)',\n",
       " 'angle(tBodyGyroJerkMean,gravityMean)',\n",
       " 'angle(X,gravityMean)',\n",
       " 'angle(Y,gravityMean)',\n",
       " 'angle(Z,gravityMean)']"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "561"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data preprocessing for TabPFN classifier model<a class=\"anchor\" id=\"6\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To process the training data for the TabPFN model, we will use Linear Discriminant Analysis (LDA) to reduce the number of features from the original 561 to below the TabPFN model's maximum limit of 100. By applying LDA, we can preserve the most relevant information for classification while reducing the complexity of the input data, making it suitable for the TabPFN model, which requires a compact input format for efficient processing and predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1020, 6)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Data processing to reduce the features to 100 or less as required for TabPFN models\n",
    "X = train_har_data.drop(columns=['Activity'])\n",
    "y = train_har_data['Activity']\n",
    "scaler = StandardScaler()\n",
    "X_scaled = scaler.fit_transform(X)\n",
    "lda = LinearDiscriminantAnalysis(n_components=min(100, len(set(y)) - 1))\n",
    "X_reduced_lda = lda.fit_transform(X_scaled, y)\n",
    "X_train_lda_df = pd.DataFrame(X_reduced_lda, columns=[f'LDA{i+1}' for i in range(X_reduced_lda.shape[1])])\n",
    "X_train_lda_df['Activity'] = y.reset_index(drop=True)\n",
    "X_train_lda_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['LDA1', 'LDA2', 'LDA3', 'LDA4', 'LDA5', 'Activity'], dtype='object')"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Visualize the final processed training data columns\n",
    "X_train_lda_df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define the explanatory variables as follows: In the training dataframe above we use   `Activity` as the target label to be predicted, using the rest of the features as explanatory variables `X`. We define the explanatory variables as follows: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# define the explanatory vairables\n",
    "X = list(X_train_lda_df.columns)\n",
    "X =X[:-1]\n",
    "len(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the explanatory variables `X` are defined, they are used as input in the `prepare_tabulardata` method from the tabular learner in `arcgis.learn`. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model.\n",
    "\n",
    "The input parameters required for the tool are used as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = prepare_tabulardata(X_train_lda_df, 'Activity', explanatory_variables=X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize training data <a class=\"anchor\" id=\"9\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get a sense of what the training data looks like, the `show_batch()` method will randomly pick a few training samples and visualize them. The samples show the explanatory variables and the `Activity` target label to predict."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Activity</th>\n",
       "      <th>LDA1</th>\n",
       "      <th>LDA2</th>\n",
       "      <th>LDA3</th>\n",
       "      <th>LDA4</th>\n",
       "      <th>LDA5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>STANDING</td>\n",
       "      <td>-19.431367</td>\n",
       "      <td>-11.174415</td>\n",
       "      <td>0.816325</td>\n",
       "      <td>-0.687153</td>\n",
       "      <td>2.809134</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130</th>\n",
       "      <td>SITTING</td>\n",
       "      <td>-18.377005</td>\n",
       "      <td>-9.429310</td>\n",
       "      <td>0.342490</td>\n",
       "      <td>-0.757008</td>\n",
       "      <td>-4.074501</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>311</th>\n",
       "      <td>WALKING</td>\n",
       "      <td>17.852727</td>\n",
       "      <td>-1.390446</td>\n",
       "      <td>-8.164604</td>\n",
       "      <td>4.176480</td>\n",
       "      <td>-0.160302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>734</th>\n",
       "      <td>WALKING_UPSTAIRS</td>\n",
       "      <td>23.633640</td>\n",
       "      <td>2.301121</td>\n",
       "      <td>2.475455</td>\n",
       "      <td>-10.447339</td>\n",
       "      <td>-0.302447</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>847</th>\n",
       "      <td>LAYING</td>\n",
       "      <td>-26.620913</td>\n",
       "      <td>14.783496</td>\n",
       "      <td>-0.705847</td>\n",
       "      <td>0.511383</td>\n",
       "      <td>-0.568820</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Activity       LDA1       LDA2      LDA3       LDA4      LDA5\n",
       "28           STANDING -19.431367 -11.174415  0.816325  -0.687153  2.809134\n",
       "130           SITTING -18.377005  -9.429310  0.342490  -0.757008 -4.074501\n",
       "311           WALKING  17.852727  -1.390446 -8.164604   4.176480 -0.160302\n",
       "734  WALKING_UPSTAIRS  23.633640   2.301121  2.475455 -10.447339 -0.302447\n",
       "847            LAYING -26.620913  14.783496 -0.705847   0.511383 -0.568820"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.show_batch(rows=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model training <a class=\"anchor\" id=\"10\"></a>\n",
    "First, we initialize the model as follows:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model initialization  <a class=\"anchor\" id=\"11\"></a>\n",
    "\n",
    "The default, initialization of the TabPFN classifier model object is shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "tabpfn_classifier = MLModel(data, 'tabpfn.TabPFNClassifier', device='cpu', N_ensemble_configurations=32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fit the model <a class=\"anchor\" id=\"12\"></a>\n",
    "\n",
    "Next, we will train the model.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "tabpfn_classifier.fit()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9901960784313726"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tabpfn_classifier.score()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see the model score is showing excellent results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize results in validation set <a class=\"anchor\" id=\"13\"></a>\n",
    "\n",
    "It is a good practice to see the results of the model viz-a-viz ground truth. The code below picks random samples and shows us the `Activity` which is the ground truth or target state and model predicted `Activity_results` side by side. This enables us to preview the results of the model we trained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Activity</th>\n",
       "      <th>LDA1</th>\n",
       "      <th>LDA2</th>\n",
       "      <th>LDA3</th>\n",
       "      <th>LDA4</th>\n",
       "      <th>LDA5</th>\n",
       "      <th>Activity_results</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>LAYING</td>\n",
       "      <td>-27.925746</td>\n",
       "      <td>17.698939</td>\n",
       "      <td>-0.211993</td>\n",
       "      <td>0.137931</td>\n",
       "      <td>-0.872844</td>\n",
       "      <td>LAYING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>299</th>\n",
       "      <td>WALKING</td>\n",
       "      <td>22.230659</td>\n",
       "      <td>0.572533</td>\n",
       "      <td>-8.318169</td>\n",
       "      <td>1.275016</td>\n",
       "      <td>0.139479</td>\n",
       "      <td>WALKING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>693</th>\n",
       "      <td>SITTING</td>\n",
       "      <td>-18.977440</td>\n",
       "      <td>-10.123049</td>\n",
       "      <td>0.551960</td>\n",
       "      <td>-0.028800</td>\n",
       "      <td>-3.010251</td>\n",
       "      <td>SITTING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>884</th>\n",
       "      <td>WALKING</td>\n",
       "      <td>23.685543</td>\n",
       "      <td>-0.261172</td>\n",
       "      <td>-11.338481</td>\n",
       "      <td>3.464041</td>\n",
       "      <td>-0.628183</td>\n",
       "      <td>WALKING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>967</th>\n",
       "      <td>SITTING</td>\n",
       "      <td>-18.285170</td>\n",
       "      <td>-11.017798</td>\n",
       "      <td>1.873937</td>\n",
       "      <td>-0.702550</td>\n",
       "      <td>-1.983438</td>\n",
       "      <td>SITTING</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Activity       LDA1       LDA2       LDA3      LDA4      LDA5  \\\n",
       "101   LAYING -27.925746  17.698939  -0.211993  0.137931 -0.872844   \n",
       "299  WALKING  22.230659   0.572533  -8.318169  1.275016  0.139479   \n",
       "693  SITTING -18.977440 -10.123049   0.551960 -0.028800 -3.010251   \n",
       "884  WALKING  23.685543  -0.261172 -11.338481  3.464041 -0.628183   \n",
       "967  SITTING -18.285170 -11.017798   1.873937 -0.702550 -1.983438   \n",
       "\n",
       "    Activity_results  \n",
       "101           LAYING  \n",
       "299          WALKING  \n",
       "693          SITTING  \n",
       "884          WALKING  \n",
       "967          SITTING  "
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tabpfn_classifier.show_results()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Predict using the TabPFN classifier model <a class=\"anchor\" id=\"14\"></a>\n",
    "\n",
    "Once the TabPFN classifier is trained on the smaller dataset of 1,020 samples, we can use it to predict the classes of a larger dataset containing 6,332 samples. Given TabPFN’s ability to process data efficiently with a single forward pass, it can handle this larger dataset quickly, classifying each sample based on the patterns learned during training. Since the model is optimized for fast and scalable predictions, it will generate class predictions for all samples. \n",
    "\n",
    "Before using the trained TabPFN model to predict the classes of the test dataset, we will first apply Linear Discriminant Analysis (LDA) to reduce the test data to the same feature space as the training data. This ensures consistency between the training and test datasets, enabling the trained TabPFN model to effectively classify the larger test sample."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(6332, 6)\n"
     ]
    }
   ],
   "source": [
    "# Align tset data with the train data format \n",
    "X = test_har_data.drop(columns=['Activity'])\n",
    "y = test_har_data['Activity']\n",
    "scaler = StandardScaler()\n",
    "X_scaled = scaler.fit_transform(X)\n",
    "lda = LinearDiscriminantAnalysis(n_components=min(100, len(set(y)) - 1)) \n",
    "X_reduced_lda = lda.fit_transform(X_scaled, y)\n",
    "X_test_lda_df = pd.DataFrame(X_reduced_lda, columns=[f'LDA{i+1}' for i in range(X_reduced_lda.shape[1])])\n",
    "X_test_lda_df['Activity'] = y.reset_index(drop=True)\n",
    "print(X_test_lda_df.shape) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>LDA1</th>\n",
       "      <th>LDA2</th>\n",
       "      <th>LDA3</th>\n",
       "      <th>LDA4</th>\n",
       "      <th>LDA5</th>\n",
       "      <th>Activity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-10.188443</td>\n",
       "      <td>-8.641377</td>\n",
       "      <td>0.606669</td>\n",
       "      <td>1.120983</td>\n",
       "      <td>3.836137</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-9.735631</td>\n",
       "      <td>-6.716675</td>\n",
       "      <td>0.537841</td>\n",
       "      <td>-0.543385</td>\n",
       "      <td>2.295157</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-8.954351</td>\n",
       "      <td>-7.376296</td>\n",
       "      <td>0.798942</td>\n",
       "      <td>-0.507465</td>\n",
       "      <td>2.508069</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-10.400401</td>\n",
       "      <td>-7.267321</td>\n",
       "      <td>1.035134</td>\n",
       "      <td>0.272738</td>\n",
       "      <td>2.034312</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-9.596161</td>\n",
       "      <td>-6.980061</td>\n",
       "      <td>0.480017</td>\n",
       "      <td>-0.284537</td>\n",
       "      <td>1.103180</td>\n",
       "      <td>STANDING</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        LDA1      LDA2      LDA3      LDA4      LDA5  Activity\n",
       "0 -10.188443 -8.641377  0.606669  1.120983  3.836137  STANDING\n",
       "1  -9.735631 -6.716675  0.537841 -0.543385  2.295157  STANDING\n",
       "2  -8.954351 -7.376296  0.798942 -0.507465  2.508069  STANDING\n",
       "3 -10.400401 -7.267321  1.035134  0.272738  2.034312  STANDING\n",
       "4  -9.596161 -6.980061  0.480017 -0.284537  1.103180  STANDING"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test_lda_df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Predict <a class=\"anchor\" id=\"15\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "activity_predicted_tabpfn = tabpfn_classifier.predict(X_test_lda_df, prediction_type='dataframe')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>LDA1</th>\n",
       "      <th>LDA2</th>\n",
       "      <th>LDA3</th>\n",
       "      <th>LDA4</th>\n",
       "      <th>LDA5</th>\n",
       "      <th>Activity</th>\n",
       "      <th>prediction_results</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6327</th>\n",
       "      <td>18.095783</td>\n",
       "      <td>2.593775</td>\n",
       "      <td>8.704337</td>\n",
       "      <td>5.424257</td>\n",
       "      <td>0.443448</td>\n",
       "      <td>WALKING_DOWNSTAIRS</td>\n",
       "      <td>WALKING_DOWNSTAIRS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6328</th>\n",
       "      <td>17.094286</td>\n",
       "      <td>1.997284</td>\n",
       "      <td>5.270752</td>\n",
       "      <td>2.839847</td>\n",
       "      <td>0.550822</td>\n",
       "      <td>WALKING_DOWNSTAIRS</td>\n",
       "      <td>WALKING_DOWNSTAIRS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6329</th>\n",
       "      <td>15.909594</td>\n",
       "      <td>1.537803</td>\n",
       "      <td>4.887237</td>\n",
       "      <td>4.771153</td>\n",
       "      <td>-0.157321</td>\n",
       "      <td>WALKING_DOWNSTAIRS</td>\n",
       "      <td>WALKING_DOWNSTAIRS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6330</th>\n",
       "      <td>11.944985</td>\n",
       "      <td>0.834660</td>\n",
       "      <td>0.116338</td>\n",
       "      <td>-6.285236</td>\n",
       "      <td>0.045984</td>\n",
       "      <td>WALKING_UPSTAIRS</td>\n",
       "      <td>WALKING_UPSTAIRS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6331</th>\n",
       "      <td>14.575570</td>\n",
       "      <td>1.737412</td>\n",
       "      <td>-0.866397</td>\n",
       "      <td>-5.458011</td>\n",
       "      <td>-1.150021</td>\n",
       "      <td>WALKING_UPSTAIRS</td>\n",
       "      <td>WALKING_UPSTAIRS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           LDA1      LDA2      LDA3      LDA4      LDA5            Activity  \\\n",
       "6327  18.095783  2.593775  8.704337  5.424257  0.443448  WALKING_DOWNSTAIRS   \n",
       "6328  17.094286  1.997284  5.270752  2.839847  0.550822  WALKING_DOWNSTAIRS   \n",
       "6329  15.909594  1.537803  4.887237  4.771153 -0.157321  WALKING_DOWNSTAIRS   \n",
       "6330  11.944985  0.834660  0.116338 -6.285236  0.045984    WALKING_UPSTAIRS   \n",
       "6331  14.575570  1.737412 -0.866397 -5.458011 -1.150021    WALKING_UPSTAIRS   \n",
       "\n",
       "      prediction_results  \n",
       "6327  WALKING_DOWNSTAIRS  \n",
       "6328  WALKING_DOWNSTAIRS  \n",
       "6329  WALKING_DOWNSTAIRS  \n",
       "6330    WALKING_UPSTAIRS  \n",
       "6331    WALKING_UPSTAIRS  "
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "activity_predicted_tabpfn.tail(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Accuracy assessment <a class=\"anchor\" id=\"16\"></a>\n",
    "\n",
    "Next, we will evaluate the model's performance. This will print out multiple model metrics that we can use to assess the model quality. These metrics include a combination of multiple evaluation criteria, such as `accuracy`, `precision`, `recall` and `F1-Score`, which collectively measure the model's performance on the validation set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 96.83%\n",
      "Precision: 0.97\n",
      "Recall: 0.97\n",
      "F1 Score: 0.97\n",
      "\n",
      "Classification Report:\n",
      "                    precision    recall  f1-score   support\n",
      "\n",
      "            LAYING       1.00      1.00      1.00      1219\n",
      "           SITTING       1.00      0.83      0.91      1119\n",
      "          STANDING       0.86      0.99      0.93      1197\n",
      "           WALKING       1.00      1.00      1.00      1031\n",
      "WALKING_DOWNSTAIRS       1.00      0.99      1.00       835\n",
      "  WALKING_UPSTAIRS       0.99      1.00      1.00       931\n",
      "\n",
      "          accuracy                           0.97      6332\n",
      "         macro avg       0.97      0.97      0.97      6332\n",
      "      weighted avg       0.97      0.97      0.97      6332\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Extract ground truth and predictions\n",
    "y_true = activity_predicted_tabpfn['Activity']\n",
    "y_pred = activity_predicted_tabpfn['prediction_results']\n",
    "\n",
    "# Calculate Accuracy\n",
    "accuracy = accuracy_score(y_true, y_pred)\n",
    "print(f'Accuracy: {accuracy * 100:.2f}%')\n",
    "\n",
    "# Calculate Precision \n",
    "precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)\n",
    "print(f'Precision: {precision:.2f}')\n",
    "\n",
    "# Calculate Recall \n",
    "recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)\n",
    "print(f'Recall: {recall:.2f}')\n",
    "\n",
    "# Calculate F1-Score \n",
    "f1 = f1_score(y_true, y_pred, average='weighted', zero_division=0)\n",
    "print(f'F1 Score: {f1:.2f}')\n",
    "\n",
    "# classification_report \n",
    "print(\"\\nClassification Report:\")\n",
    "print(classification_report(y_true, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The performance metrics obtained from the trained TabPFN model on the test dataset of 6,332 samples indicate excellent classification quality.\n",
    "\n",
    "`Accuracy (96.81%)` : The model correctly classified approximately 97% of the samples, which is a strong indication of its ability to generalize well to unseen data, despite being trained on a smaller dataset of just 1,020 samples.\n",
    "\n",
    "`Precision (0.97)`      : Precision measures the proportion of true positive predictions among all positive predictions made by the model. A precision of 0.97 means that 97% of the predicted positive activity classes are correct, indicating that the model rarely makes false positive errors.\n",
    "\n",
    "`Recall (0.97)`      : Recall represents the model's ability to correctly identify all relevant instances of a class. A recall of 0.97 means that the model correctly identifies 97% of all actual positive instances, with minimal false negatives.\n",
    "\n",
    "`F1 Score (0.97)`      : The F1 Score is the harmonic mean of precision and recall, and a value of 0.97 shows that the model balances precision and recall very well. This indicates that the model is both highly accurate and sensitive in detecting the correct activity classes.\n",
    "\n",
    "Overall, these metrics demonstrate that the TabPFN model performs exceptionally well, achieving near-perfect classification with minimal errors. This performance is particularly impressive given that it was trained on a relatively small sample size of 1,020 data points, highlighting its efficiency and effectiveness in handling human activity recognition tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion <a class=\"anchor\" id=\"17\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This project highlights the powerful capabilities of the TabPFN classifier for Human Activity Recognition (HAR) tasks. Even with a training dataset of just 1,020 samples, the model achieved impressive results on a larger test dataset of 6,332 samples, with an accuracy of 96.81%, and precision, recall, and F1 scores all reaching 0.97. The TabPFN model's speed, simplicity, and strong performance in classifying human activities, highlight its potential for applications in healthcare, fitness, smart cities and disaster relief operations, offering an efficient and scalable solution for HAR systems."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TabPFN license information <a class=\"anchor\" id=\"18\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| License Description |\n",
    "|:------------------- |\n",
    "| Built with TabPFN - tabpfn.TabPFNClassifier <https://priorlabs.ai/tabpfn-license> |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pro3.5_LearnLesson2025",
   "language": "python",
   "name": "pro3.5_learnlesson2025"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
